A hierarchical F0 modeling method for HMM-based speech synthesis
نویسندگان
چکیده
The conventional state-based F0 modeling in HMM-based speech synthesis system is good at capturing micro prosodic features, but difficult to characterize long term pitch patterns directly. This paper presents a hierarchical F0 modeling method to address this issue. In this method, different F0 models are used to model the pitch patterns for different prosodic layers (including state, phone, syllable, word, etc), and are combined with an additive structure. In model training, the F0 model for each layer is firstly initialized by using the residual between original F0s and generated F0s from other layers as training data, and then the F0 models of all layers are re-estimated simultaneously under a minimum generation error (MGE) training framework. We investigate the effectiveness of hierarchical F0 modeling with different layer settings, experimental results show that the proposed hierarchical F0 modeling method significantly outperforms the conventional state-based F0 modeling method.
منابع مشابه
Superpositional Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis
Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based ones, gain special attention from researchers because of their ability in generating speech in various voice qualities and styles. In these methods, all acoustic parameters (except durational ones) are handled in a frame-by-frame manner, which is not appropriate for prosodic features. Although relation of adja...
متن کاملDiscontinuous Observation HMM for Prosodic-Event-Based F0 Generation
This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodicunit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced r...
متن کاملImproved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis
Generation process model of fundamental frequency (F0) contours can well represent F0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, by using the model, improvement of HMM-based speech synthesis is expected. One of major problems preventing the use of the model is that the performance of automatic extraction of the model parameters from observ...
متن کاملExtracting MFCC, F0 feature in Vietnamese HMM-based speech synthesis
HMM-based statistical speech synthesis method is not requiring a very large speech corpus for training the system. In this system, statistical modeling is applied to learn distributions of context-dependent acoustic vectors extracted from speech signals, each vector containing a suitable parametric representation of one speech frame and Vietnamese phonetic rules to synthesize speech. The method...
متن کاملGeneration of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model
The HMM-based speech synthesis system can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are generated by multi-stream HMMs separately. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010